review quality
- North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
- North America > Canada > Ontario > Toronto (0.04)
- North America > United States > California > Santa Clara County > Palo Alto (0.04)
- Research Report > New Finding (1.00)
- Research Report > Experimental Study (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.46)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Personal Assistant Systems (0.46)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.46)
Counterfactual Evaluation of Peer-Review Assignment Policies
Peer review assignment algorithms aim to match research papers to suitable expert reviewers, working to maximize the quality of the resulting reviews. A key challenge in designing effective assignment policies is evaluating how changes to the assignment algorithm map to changes in review quality. In this work, we leverage recently proposed policies that introduce randomness in peer-review assignment--in order to mitigate fraud--as a valuable opportunity to evaluate counterfactual assignment policies. Specifically, we exploit how such randomized assignments provide a positive probability of observing the reviews of many assignment policies of interest. To address challenges in applying standard off-policy evaluation methods, such as violations of positivity, we introduce novel methods for partial identification based on monotonicity and Lipschitz smoothness assumptions for the mapping between reviewer-paper covariates and outcomes. We apply our methods to peer-review data from two computer science venues: the TPDP'21 workshop (95 papers and 35 reviewers) and the AAAI'22 conference (8,450 papers and 3,145 reviewers). We consider estimates of (i) the effect on review quality when changing weights in the assignment algorithm, e.g., weighting reviewers' bids vs. textual similarity (between the review's past papers and the submission), and (ii) the cost of randomization, capturing the difference in expected quality between the perturbed and unperturbed optimal match. We find that placing higher weight on text similarity results in higher review quality and that introducing randomization in the reviewer-paper assignment only marginally reduces the review quality. Our methods for partial identification may be of independent interest, while our off-policy approach can likely find use in evaluating a broad class of algorithmic matching systems.
- North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
- North America > Canada > Ontario > Toronto (0.04)
- North America > United States > California > Santa Clara County > Palo Alto (0.04)
- Research Report > New Finding (1.00)
- Research Report > Experimental Study (1.00)
ReviewRL: Towards Automated Scientific Review with RL
Zeng, Sihang, Tian, Kai, Zhang, Kaiyan, wang, Yuru, Gao, Junqi, Liu, Runze, Yang, Sa, Li, Jingxuan, Long, Xinwei, Ma, Jiaheng, Qi, Biqing, Zhou, Bowen
Peer review is essential for scientific progress but faces growing challenges due to increasing submission volumes and reviewer fatigue. Existing automated review approaches struggle with factual accuracy, rating consistency, and analytical depth, often generating superficial or generic feedback lacking the insights characteristic of high-quality human reviews. We introduce ReviewRL, a reinforcement learning framework for generating comprehensive and factually grounded scientific paper reviews. Our approach combines: (1) an ArXiv-MCP retrieval-augmented context generation pipeline that incorporates relevant scientific literature, (2) supervised fine-tuning that establishes foundational reviewing capabilities, and (3) a reinforcement learning procedure with a composite reward function that jointly enhances review quality and rating accuracy. Experiments on ICLR 2025 papers demonstrate that ReviewRL significantly outperforms existing methods across both rule-based metrics and model-based quality assessments. ReviewRL establishes a foundational framework for RL-driven automatic critique generation in scientific discovery, demonstrating promising potential for future development in this domain. The implementation of ReviewRL will be released at GitHub.
- Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.99)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.95)
- (2 more...)
Position: The AI Conference Peer Review Crisis Demands Author Feedback and Reviewer Rewards
Kim, Jaeho, Lee, Yunseok, Lee, Seulki
The peer review process in major artificial intelligence (AI) conferences faces unprecedented challenges with the surge of paper submissions (exceeding 10,000 submissions per venue), accompanied by growing concerns over review quality and reviewer responsibility. This position paper argues for the need to transform the traditional one-way review system into a bi-directional feedback loop where authors evaluate review quality and reviewers earn formal accreditation, creating an accountability framework that promotes a sustainable, high-quality peer review system. The current review system can be viewed as an interaction between three parties: the authors, reviewers, and system (i.e., conference), where we posit that all three parties share responsibility for the current problems. However, issues with authors can only be addressed through policy enforcement and detection tools, and ethical concerns can only be corrected through self-reflection. As such, this paper focuses on reforming reviewer accountability with systematic rewards through two key mechanisms: (1) a two-stage bi-directional review system that allows authors to evaluate reviews while minimizing retaliatory behavior, (2)a systematic reviewer reward system that incentivizes quality reviewing. We ask for the community's strong interest in these problems and the reforms that are needed to enhance the peer review process.
- Asia > South Korea > Ulsan > Ulsan (0.04)
- North America > Canada (0.04)
- Overview (1.00)
- Research Report > Experimental Study (0.67)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.93)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.74)
- (4 more...)
CRScore: Grounding Automated Evaluation of Code Review Comments in Code Claims and Smells
Naik, Atharva, Alenius, Marcus, Fried, Daniel, Rose, Carolyn
The task of automated code review has recently gained a lot of attention from the machine learning community. However, current review comment evaluation metrics rely on comparisons with a human-written reference for a given code change (also called a diff), even though code review is a one-to-many problem like generation and summarization with many "valid reviews" for a diff. To tackle these issues we develop a CRScore - a reference-free metric to measure dimensions of review quality like conciseness, comprehensiveness, and relevance. We design CRScore to evaluate reviews in a way that is grounded in claims and potential issues detected in the code by LLMs and static analyzers. We demonstrate that CRScore can produce valid, fine-grained scores of review quality that have the greatest alignment with human judgment (0.54 Spearman correlation) and are more sensitive than reference-based metrics. We also release a corpus of 2.6k human-annotated review quality scores for machine-generated and GitHub review comments to support the development of automated metrics.
- North America > United States > New York > New York County > New York City (0.04)
- North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
- North America > Dominican Republic (0.04)
- (4 more...)
- Research Report > Experimental Study (0.67)
- Research Report > New Finding (0.46)
Evaluating the "Learning on Graphs" Conference Experience
Rieck, Bastian, Coupette, Corinna
With machine learning conferences growing ever larger, and reviewing processes becoming increasingly elaborate, more data-driven insights into their workings are required. In this report, we present the results of a survey accompanying the first "Learning on Graphs" (LoG) Conference. The survey was directed to evaluate the submission and review process from different perspectives, including authors, reviewers, and area chairs alike. The first "Learning on Graphs" (LoG) Conference (9-12 December, 2022) was remarkable in more ways than one: starting from scratch, the conference aims to be the place for graph learning research, making use of an advisory committee that consists of international experts in the field. Moreover, at is core, LoG wants to be known for its exceptional review quality.
- Research Report (0.64)
- Questionnaire & Opinion Survey (0.46)
- Personal (0.46)
- Overview (0.46)
Avoiding a Tragedy of the Commons in the Peer Review Process
Sculley, D, Snoek, Jasper, Wiltschko, Alex
Peer review is the foundation of scientific publication, and the task of reviewing has long been seen as a cornerstone of professional service. However, the massive growth in the field of machine learning has put this community benefit under stress, threatening both the sustainability of an effective review process and the overall progress of the field. In this position paper, we argue that a tragedy of the commons outcome may be avoided by emphasizing the professional aspects of this service. In particular, we propose a rubric to hold reviewers to an objective standard for review quality. In turn, we also propose that reviewers be given appropriate incentive. As one possible such incentive, we explore the idea of financial compensation on a per-review basis. We suggest reasonable funding models and thoughts on long term effects.
Simulation Study on a New Peer Review Approach
Steppi, Albert, Qu, Jinchan, Tao, Minjing, Zhao, Tingting, Pang, Xiaodong, Zhang, Jinfeng
The increasing volume of scientific publications and grant proposals has generated an unprecedentedly high workload to scientific communities. Consequently, review quality has been decreasing and review outcomes have become less correlated with the real merits of the papers and proposals. A novel distributed peer review (DPR) approach has recently been proposed to address these issues. The new approach assigns principal investigators (PIs) who submitted proposals (or papers) to the same program as reviewers. Each PI reviews and ranks a small number (such as seven) of other PIs' proposals. The individual rankings are then used to estimate a global ranking of all proposals using the Modified Borda Count (MBC). In this study, we perform simulation studies to investigate several parameters important for the decision making when adopting this new approach. We also propose a new method called Concordance Index-based Global Ranking (CIGR) to estimate global ranking from individual rankings. An efficient simulated annealing algorithm is designed to search the optimal Concordance Index (CI). Moreover, we design a new balanced review assignment procedure, which can result in significantly better performance for both MBC and CIGR methods. We found that CIGR performs better than MBC when the review quality is relatively high. As review quality and review difficulty are tightly correlated, we constructed a boundary in the space of review quality vs review difficulty that separates the CIGR-superior and MBC-superior regions. Finally, we propose a multi-stage DPR strategy based on CIGR, which has the potential to substantially improve the overall review performance while reducing the review workload.
- North America > United States > Ohio > Franklin County > Columbus (0.04)
- North America > United States > Florida > Leon County > Tallahassee (0.04)